In this section, we will see how to recognize hand-written digits with a simple neural network running on TensorFlow. This is an adaptation of TensorFlow Tutorial.
So far, we have learned how to classify datapoints in two dimensional spaces, such as a geolocation with latitude and longitude. In this section, we will extend the same technique to classify datapoints in n-dimensional space.
What do you mean by "classifying datapoints in n-dimensional space"? As an example, we will use images of handwritten text called MNIST dataset as the datapoints in n-dimensional space. MNIST is one of the most popular datasets used for learning neural network technology. It's like a hello world in neural network.
In [ ]:
# import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print (mnist.train.images.shape)
For example, if you print the values of 5th images out of the 55,000 images, it looks like this:
In [ ]:
# check MNIST training images matrix data
sample_img = mnist.train.images[5].reshape(28, 28)
print(sample_img)
You can also plot the image with Matplotlib.
In [ ]:
import matplotlib.pyplot as plt
plt.imshow(sample_img).set_cmap('Greys')
Run the cell below to check the shape and the values in the label array.
In [ ]:
# check MNIST labels shape
print(mnist.train.labels.shape)
# show MNIST label data
print(mnist.train.labels[5])
In the example above, you have 1.0 at the 8th value. That means the label for the 5th image is "8". This is so called one-hot vector, a popular way of encoding the labels in machine learning classification problem.
If you have 784 values in an array or a vector, it is called "784 dimensional vector" in the context of machine learning. It's a vector in 784 dimensional space.
If you have X and Y values in 2D space, or X, Y and Z in 3D space, it's really easy for humans to imagine how they look like. For example, if you have three values in a 3D vector that represents "you like movies so much, you also like actors, and you like music a little", you can draw a vector in 3D space like this.
Meanwhile, we can't imagine how high dimensional spaces and vectors look like, if it's higher than 3D. You can't draw a picture in your head what kind of shape a vector in 784 dimensional space would have.
But there's a great tool to visualize that. Open TensorFlow Embedding Projector and follow the steps below.
MNIST images
as DATA at the top leftlabel
from the Color by
menuT-SNE
tab at the middle of left navigationYou will see the MNIST images would be slowly grouped into 10 groups for each digits.
What's happening here? The tool uses an algorithm called t-SNE to do dimensionarity reduction. That means, you can "cast a shadow" of the n-dimensional space into 3D or 2D space. So, what you are watching above is a shadow of MNIST image vectors in 784 dimensional space, casted on 3D space.
So, an image in MNIST dataset is a 784 dimensional vector. And you can use a single neuron to classify each vector is an image of "1" or not. To do that you can do the same thing we have done with latitude and longitude: mutiplies the 784 values with weights and checks if the sum exceeds a certain threthold.
To classify an image to 10 digits, we need a single layer neural network (Perception) with 10 neurons. It would look like this. Here we have inputs from X1 to X784, multiplied with the bunch of weights to get 10 summation results, added to the 10 biases that work as thresholds. We'll see what is "softmax" later.
There is a great way in Math to calculate this kind of "multiplying a set of numbers agains another set of numbers" with a single formula. That is called Matrix operation. So, you can define a 784 dimensional vector for holding the image data, a 784 x 10 matrix for holding the weights, and 10 dimensional vector for holding biases. Then you can define a single layer neural network with the following formula by using a dot product between the weight matrix and the image vector.
You can also write this calculation in the following formula, where W is the weight matrix, x is the input vector, and b is the bias vector.
$${\Huge y=softmax(Wx + b)}$$
This is the reason why you see many math formulas in neural network and machine learning text books. It's so easier to use them to express the ideas in a terse math formula.
In [ ]:
# define a neural network (softmax logistic regression)
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784]) # a placeholder for inputting the image
W = tf.Variable(tf.zeros([784, 10])) # weights
b = tf.Variable(tf.zeros([10])) # biases
y = tf.nn.softmax(tf.matmul(x, W) + b)
y
Tensor is just an array. In Physics, they use the word Tensor for the complex math calculations, but you can forget about it in the context of machine learning. Tensor in TensorFlow is just a multi-dimensional array that can hold any high or low dimensional vectors and matrices of the input data, weights, biases and etc.
So, the word TensorFlow means that you can use the Low level API to define a flow (a graph) of calculations between vectors and matrices. In this case, we define the following computation graph.
tf.placeholder
method to define a Tensor x
for accepting any number of 784 dimensional vectors. This will be used to receive the training image datatf.Variable
method to define a Variable W
for holding the weight matrix that has 784 x 10 valuestf.Variable
method to define a Variable b
for holding the bias vector that has 10 valuestf.matmul
method to define a dot product between x
and W
, and calls tf.nn.softmax
method to define a softmax of the value. The result Tensor is named as y
Please keep in mind that you are only defining the graph of calculations, not executing the calculations at this time. You need to pass the graph definition to Session
to execute it. We will look how it works later.
By training the network with the 55,000 images, you will have patterns of weights like the following. The blue area has positive weights, and the red area has negative weights.
You can see that the blue and red patterns would work as "filters" for looking at each image. The network applies those filters on each image, and see the matching. If an image matches well with a filter (weights) for "8" and the summation exceeds the threashold (bias) for "8", the network believes the images must be an image of "8".
Let's see how it works by using 10 random values. Run the cell below to create 10 random values.
In [ ]:
import numpy as np
i = np.arange(0, 10)
n = np.random.randn(10)
plt.bar(i, n)
Then, run the cell below to calculate the softmax values for those random values.
In [ ]:
def softmax(n):
return np.exp(n) / np.sum(np.exp(n))
s = softmax(n)
plt.bar(i, s)
print('The sum of softmax: ' + str(np.sum(s)))
As you can see, softmax normalizes the original values to compare them as probabilities from 0.0 to 1.0, and the summation of all values will be 1.0. So you can choose a single value closest to 1.0 as the final answer from neural network.
In [ ]:
# define the train step to minimize the cross entropy with SGD
y_ = tf.placeholder(tf.float32, [None, 10]) # the training labels
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
train_step
tf.placeholder
method to define a Tensor _y
for accepting any number of 10 dimensional vectors. This will be used to receive the training labelstf.log
and tf.reduce_sum
to define a cross entropy calculation on the softmax result from the network. The result Tensor is named as cross_entropy
tf.train.GradientsDescentOptimizer.minimize
method to use Gradient Descent algorithm to train the networkWhen you train your model in machine learning, you define a loss function for evaluating the accuracy the model while you are training it. In neural network, one of the popular loss function is Cross Entropy as follows.
Cross entropy function is like a teacher for training your model who'd measure how much error your neural network is making.
(Diagram quoted from: TensorFlow and deep learning, without a PhD)
The formula simply means, it returns higher value when you have many wrong answers, and lower value when you have many correct answers.
Let's see how it works in practice. Run the cell below to define a label.
In [ ]:
# label
label = np.array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0])
plt.bar(i, label)
Then, let's calculate a cross entropy value on the softmax values calculated at the last lab. Run the cell below.
In [ ]:
def cross_entropy(x, _y):
return -np.sum(_y * np.log(x))
cross_entropy
This defines the method cross_entropy
. You can pass the output from the softmax as parameter x
, and pass the labels as _y
. So that the method calculates cross entropy value.
Now, run the cell below to emulate what would happen during the training.
In [ ]:
# simulate the training
cross_ent = []
for i in range(0, 100):
cross_ent.append(cross_entropy(softmax(n), label))
n[8] += 0.1
plt.plot(cross_ent)
In the code above, it manually increase the value of n[8]
with the loop. It's faking a training to see how the cross entropy value changes as training proceeds. As you can see on the graph, cross entropy function has a steep curb at the initial phase of the training, means it returns much bigger value if the answer from the network is wrong. That's the reason why many people love it as a loss function to make the neural network training faster.
In [ ]:
# supress warning messages
tf.logging.set_verbosity(tf.logging.ERROR)
# initialize variables and session
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
sess
At line 6, it creates a new Session
. Session represents a runtime of TensorFlow where you can execute various TensorFlow operations for training, testing and etc. With TensorFlow Low level API, you follow the steps below to do training.
Session
for training the modelSession
Why we need to use this tedious procedures? Because TensorFlow is designed to be scalable and portable. You can specify different kinds of Sessions
that represent the runtime on your local laptop, GPUs on a PC, or CPU/GPU/TPU on the cloud. So, the whole design - defining a computation graph and pass it on a runtime - allows the code independent from a particular runtime.
Next, start training. In this case we will use Stochastic Gradient Decent (SGD). That means, when you apploy the Gradient Descent algorithm (defined with the tf.Operation train_step
) to the training data, you will randomly pick (so it is called "stochastic") 100 samples out of 55,000 images and labels. This 100 random sample is called mini batch. SGD allows you to have the training much faster and efficient than applying gradient decent on entire training data. Run the cell below to try this in practice.
In [ ]:
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
print('Training Finished.')
Session.run
method to pass the train step definition and the mini batch.This is what was happening inside the DNNClassifier
we have used in the section 2. You can define your own network and algorithm with L ow level API for sophisticated or complex applications.
In [ ]:
predict_label = tf.argmax(y, 1)
predicted_labels = sess.run(predict_label, {x: mnist.test.images})
print(predicted_labels)
tf.argmax
to define a graph for finding a label that has the largest softmax valueSession.run
to execute the calculation with the graph definition predict_label
where it uses mnist.test.images
(the array of 10,000 test images) as x
in the neural network. This returns an array with 10,000 predicted labelsRun the cell below and try changing the parameter to see prediction results on various labels.
In [ ]:
def show_image_and_predicted_label(x):
print(" Correct label: " + str(np.argmax(mnist.test.labels[x])))
print("Predicted label: " + str(predicted_labels[x]))
plt.imshow(mnist.test.images[x].reshape(28, 28)).set_cmap('Greys')
return x
show_image_and_predicted_label(20);
In [ ]:
is_prediction_correct = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
calc_accuracy = tf.reduce_mean(tf.cast(is_prediction_correct, "float"))
accuracy = sess.run(calc_accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
print('Accuracy: ' + str(accuracy))
tf.argmax
method on y
to pick the prediction results from the network, and _y
to pick the correct labels. The tf.equal
method compares them and generate a list of boolean valuescalc_accuracy
. The accuracy value will be returned from the Session.run
method at line 3x
and testing labels to Tensor y
, and calls Session.run
method to run the test calculation flowThat's it!
In this session you have seen how you can use TensorFlow Low level API define the single layer neural network, train it, and use the network to classify the test images at about 92% accuracy.
Congrats! This is the end of this codelab. If you want to learn more about the advanced techniques you can use with TensorFlow, you may check the following page:
With these materials you can learn more about TensorFlow APIs, advanced neural network algorithms such as Convolutional Neural Network, and several optimization techniques to get better accuracy on training models. Important Note: Please make sure to Delete the Datalab and delete Permanent Disk (PD) after finishing the codelab. If you leave Datalab running on the cloud, it possible you will get charged for it.